Identification of Document Types from Various Kinds of Document Images Based on Physical and Layout Features
نویسندگان
چکیده
When we develop a general purpose document image understanding system, it is important, as the first step, to distinguish individual documents. We propose an approach which first classifies document images into some distinct types and then interprets them exactly by using an appropriate document model. In this paper, we define groups of documents and describe the classification method based on the verification mechanism by using physical and layout features of documents. Also, we show the experimental result in our method.
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملپژوهشی کیفی در تحلیل الگوی بهرهگیری خبرگان حوزهی سلامت از تصاویر پزشکی
Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996